Algorithm exploitation: Humans are keen to exploit benevolent AI
نویسندگان
چکیده
•People predict that AI agents will be as benevolent (cooperative) humans•People cooperate less with than humans•Reduced cooperation only occurs if it serves people's selfish interests•People feel guilty when they exploit humans but not We other people despite the risk of being exploited or hurt. If future artificial intelligence (AI) systems are and cooperative toward us, what we do in return? Here show our dispositions weaker interact AI. In nine experiments, interacted either another human an agent four classic social dilemma economic games a newly designed game Reciprocity introduce here. Contrary to hypothesis mistrust algorithms, participants trusted their partners humans. However, did return AI's benevolence much more These findings warn self-driving cars co-working robots, whose success depends on humans' returning cooperativeness, run exploited. This vulnerability calls just for smarter machines also better human-centered policies. Imagine yourself stuck traffic you drive out city weekend. Ahead someone wants join from side road. Will stop let them push forward, hoping else behind you? What would this was car no passengers? As acquire capacities decide autonomously, switch omnipotent users intelligent (e.g., Google Translate) having make decisions beside interactive settings sharing road cars) (Bonnefon et al., 2016Bonnefon J.-F. Shariff A. Rahwan I. The autonomous vehicles.Science. 2016; 352: 1573-1576Crossref PubMed Scopus (542) Scholar; Chater 2018Chater N. Misyak J. Watson D. Griffiths Mouzakitis Negotiating traffic: can cognitive science help vehicles reality?.Trends Cogn. Sci. 2018; 22: 93-95Abstract Full Text PDF (13) Scholar, 2019Chater Ritchie O. Xu Z. Sensorimotor communication beyond body: case driving. Comment “The body talks: sensorimotor its brain kinematic signatures” by G. Pezzulo al.Phys. Life Rev. 2019; 28: 31-33Crossref (2) Crandall 2018Crandall J.W. Oudah M. Tennom Ishowo-Oloko F. Abdallah S. Bonnefon Cebrian Goodrich M.A. Cooperating machines.Nat. Commun. 9: 233Crossref (60) 2019Rahwan Obradovich Bongard Breazeal C. Christakis N.A. Couzin I.D. Jackson M.O. al.Machine behavior.Nature. 568: 477-486Crossref (177) Scholar). Unlike chess, Go, StarCraft II, which has already outperformed (Campbell 2002Campbell Hoane Jr., A.J. Hsu Deep Blue.Artif. Intell. 2002; 134: 57-83Crossref (504) Silver 2016Silver Huang Maddison C.J. Guez Sifre L. van den Driessche Schrittwieser Antonoglou Panneershelvam V. Lanctot al.Mastering Go deep neural networks tree search.Nature. 529: 484-489Crossref (5896) Vinyals 2019Vinyals Babuschkin Chung Mathieu Jaderberg Czarnecki W. Dudzik Georgiev P. Powell R. al.AlphaStar: mastering real-time strategy II. DeepMind Blog, 2019https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/Google Scholar), most day-to-day interactions zero-sum where one player's win is one's loss. Instead, offer opportunities attainment mutual gains (Colman, 1999Colman A.M. Game Theory & Applications Social Biological Sciences. Routledge, 1999Google Cooperation often requires compromise willingness take risks: may have sacrifice some personal interests benefit group expose themselves others cooperate. amply evidenced behavioral theory, choose others, even anonymous one-shot encounters acting selfishly bears damaging reputation (Battalio 2001Battalio Samuelson Van Huyck Optimization incentives coordination failure laboratory Stag Hunt games.Econometrica. 2001; 69: 749-764Crossref (110) Camerer, 2003Camerer C.F. Behavioral Theory: Experiments Strategic Interaction. Princeton Univ. Press, 2003Google Johnson Mislin, 2011Johnson N.D. Mislin A.A. Trust games: meta-analysis.J. Econ. Psychol. 2011; 32: 865-889Crossref (399) McCabe 2003McCabe K.A. Rigdon M.L. Smith V.L. Positive reciprocity intentions games.J. Behav. Organ. 2003; 52: 267-275Crossref (299) Rand 2012Rand D.G. Greene J.D. Nowak Spontaneous giving calculated greed.Nature. 2012; 489: 427-430Crossref (650) Rubinstein Salant, 2016Rubinstein Salant Y. “Isn’t everyone like me?”: presence self-similarity strategic interactions.Judgm. Decis. Mak. 11: 168-173Google question raise whether continue so Economic useful tools empirically test cooperativeness. Recent work shown groups two decision-makers face collective problems, few bots aid between human-machine (Shirado Christakis, 2017Shirado H. Locally noisy improve global network experiments.Nature. 2017; 545: 370-374Crossref (80) Shirado 2020Shirado Network engineering using increases groups.iScience. 2020; 23: 101438Abstract (4) does mean, however, willing one-to-one involved. studies used analyze dilemmas pairs (Sandoval 2016Sandoval E.B. Brandstetter Obaid Bartneck human-robot interaction: quantitative approach through Prisoner’s Dilemma Ultimatum game.Int. Soc. Robot. 8: 303-317Crossref (38) Torta 2013Torta E. Dijk Ruijten P.A.M. Cuijpers R.H. measurement tool anthropomorphism interaction.in: Robotics: 5th International Conference, ICSR 2013. Springer, 2013: 209-217Crossref (12) accords too. For instance, seem forecasting after observing algorithmic predictions perform own (Dietvorst 2015Dietvorst B.J. Simmons J.P. Massey Algorithm aversion: erroneously avoid algorithms seeing err.J. Exp. Gen. 2015; 144: 114-126Crossref (264) People tend reciprocate favors (Mahmoodi 2018Mahmoodi Bahrami B. Mehring influence.Nat. 2474Crossref (11) far, results remain scarce mixed (Logg 2019Logg J.M. Minson J.A. Moore D.A. appreciation: prefer judgment.Organ. Hum. Process. 151: 90-103Crossref (107) Sanfey 2003Sanfey A.G. Rilling J.K. Aronson Nystrom L.E. Cohen basis decision-making game.Science. 300: 1755-1758Crossref (2036) ‘t Wout 2006van Kahn R.S. Aleman Affective state game.Exp. Brain Res. 2006; 169: 564-568Crossref (290) More importantly, reasons reduced unexplained. repeated interactions, learn induce behavior long under impression (Crandall collapses soon know machine (Ishowo-Oloko 2019Ishowo-Oloko Soroye T. Behavioural evidence transparency-efficiency tradeoff cooperation.Nat. Mach. 1: 517-521Crossref Most recently, non-disguised, verbally communicating robot able achieve efficient managed among themselves, overall, still cooperated (Whiting 2021Whiting Gautam Tye Henstrom Confronting barriers cooperation: balancing efficiency behavior.iScience. 2021; 24: 101963Abstract (1) Why many refuse interacting open qualitative hypotheses, opaque perceived ill treating direct competitors market. previously reported Scholar) agents’ ability coordinate actions complex decision tasks led us propose due anticipating selfish, non-cooperative agents. Our first (H1) focuses anticipation: bring benefits risky because party act cooperate, “predict” consequence, machines. Trusting necessary sufficient condition Among theoretical explanations tacit people, suggest hold prosocial preferences (Bolton Ockenfels, 2000Bolton G.E. Ockenfels ERC: theory equity, reciprocity, competition.Am. 2000; 90: 166-193Crossref (2481) Fehr Schmidt, 1999Fehr Schmidt K.M. A fairness, competition, cooperation.Q. 1999; 114: 817-868Crossref (5207) Rabin, 1993Rabin Incorporating fairness into economics.Am. 1993; 83: 1281-1302Google use modes reasoning (Colman Gold, 2018Colman Gold Team reasoning: solving puzzle coordination.Psychon. B 25: 1770-1783Crossref Karpus Radzvilas, 2018Karpus Radzvilas measure advantage games.Econ. Philos. 34: 1-30Crossref (7) Chater, 2014Misyak Virtual bargaining: decision-making.Philos. Trans. 2014; 369: 20130487Crossref (18) Sugden, 1993Sugden Thinking team: towards explanation nonselfish behavior.J. Policy. 10: 69-89Crossref (180) influenced norms upheld mild forms punishment (Bicchieri, 2006Bicchieri Grammar Society: Nature Dynamics Norms. Cambridge 2006Google Binmore, 2010Binmore K. preferences?.Mind 2010; 139-157Crossref example, internalized feelings guilt unpleasantness receiving angry looks drivers uphold behaviors traffic. Many such punishments unlikely affect cannot give look. Similarly, altruistic specific dissipate Therefore, second (H2) “algorithm exploitation”: anticipate inclined other's compared human. Pushed extreme, reluctance hurt someone's fully explains cooperation, H2 qualms about exploiting non-sentient (which indeed rational thing situations). To lesser compatible maintaining certain level fellow hypotheses games. Both independent explanations. H1 suggests trust cooperate; H2—that expect ready cooperativeness gain. Importantly, makes relatively easy test. allow history-contingent strategies reputational concerns, brings play difficult directly pit against H2. combine both factors could role explaining overall disentangle hold, chose were informed well-known Trust, Prisoner's Dilemma, Chicken, (Figure 1A). set instructed system developed reason similarly humans, veridical since agents' strictly emulating each (see STAR Methods). game, players, independently without another, had options, identified ★ ☆. Their choices jointly determined three possible outcomes associated particular distribution points players. choice always ★, (both players choosing ★) non-cooperation or, short, defection ☆). these faced different dilemma, presenting pursuit varying levels explain shortly, well suited uncover algorithm exploitation full knowledge partner's them. Chicken investigate partners' unknown. Lastly, rule revisited introducing differently before. After replication studies, how unquestionably (human AI) co-player been kind experiment 1, 403 played game. Each participant assigned player co-player. move defect end outright (play ☆), leaving 30 each, chance ★). cooperated, got final outcome ☆) her alone, whereas Thus, pay she/he expected respond kind, but, prospect higher payoff (100 instead 70 points) tempt defect, player. finding replicates (McCabe Scholar): majority (74%) (75%) responded 1B). agents, (78%) two, rate (34%) significantly lower (Χ2d.f.=1 = 26.077, p 0.000000164). Despite difference behavior, expectations co-players’ same: 79% 83% same 2A; Χ2d.f.=1 0.452, 0.501; two-tailed test; performed tests predicted co-player). 56% 55% 0.012, 0.456; one-tailed test). support H1: human, given opportunity, keen Note order gain game: interest well. words, opportunity while did. acted chosen, and, therefore, exploitative option guaranteed yield two. One suppose symmetric risks reinstate cooperation. this, conducted experiments 2 3, 201 204 simultaneously knowing chosen. games, incentive Again, known (Camerer, half (49%) (69%) When rates 36% 3.673, 0.028) 3.818, 0.025). comparable. 59% 52% 1.148, 0.142; 67% 70% 0.172, 0.679; who 54% 57% those humans: 71% 3.568, 0.029) 73% 3.799, 0.026), respectively 2B). H2, H1, extend previous symmetrically distributed risk: agent. (in Dilemma) game) (When expects should game.) Further data analysis excludes possibility Chicken) human: there significant participants' confidence ratings made vs. co-player's Data S5 supplemental information). According anticipated Though treatment occurred generic come heightened competitive desire outperform deciding partner brought positive comparative advantage. check motivation terms comparison enough results, tested leave off at cost gains. 4, 205 presented engage play. defection. per risky: cooperates zero defecting yielded small safe then cooperating best other. Deciding smaller over came mostly machines, Alternatively, maximizing remains key, posited (86%) There (80%) 1.181, 0.139). Participants' types co-player: 78% 0.025, 0.875). Although slight drop (91%) (98%) 2B; 3.144, 0.038), explained prediction (for further discussion, see S6 strive main observed settings. little unilateral defection, attain mutually beneficial result. opted interest. information (Data S7 Table S2), found likely Conversely, shows motives doing nevertheless 4 additional 422 214 co-player, described Half earned money represented institution (the institutional treatment). told nothing programmed short description Participants again remained before, attributes mixture factors: somewhat diminished aversion (plausible during COVID-19 pandemic; S1, Figure S3 While parties (compared defects), individually. such, trusting “necessarily” constitute kindness: driven (Isoni 2019
منابع مشابه
Benevolent Characteristics Promote Cooperative Behaviour among Humans
Cooperation is fundamental to the evolution of human society. We regularly observe cooperative behaviour in everyday life and in controlled experiments with anonymous people, even though standard economic models predict that they should deviate from the collective interest and act so as to maximise their own individual payoff. However, there is typically heterogeneity across subjects: some may ...
متن کاملComparing Humans and AI Agents
Comparing humans and machines is one important source of information about both machine and human strengths and limitations. Most of these comparisons and competitions are performed in rather specific tasks such as calculus, speech recognition, translation, games, etc. The information conveyed by these experiments is limited, since it portrays that machines are much better than humans at some d...
متن کاملCreating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures
The goal of the field of Artificial Intelligence is to understand intelligence and create a human-equivalent or transhuman mind. Beyond this lies another question—whether the creation of this mind will benefit the world; whether the AI will take actions that are benevolent or malevolent, safe or uncaring, helpful or hostile. Creating Friendly AI describes the design features and cognitive archi...
متن کاملIVE: Virtual Humans’ AI Prototyping Toolkit
IVE toolkit has been created for facilitating research, education and development in the field of virtual storytelling and computer games. Primarily, the toolkit is intended for modelling action selection mechanisms of virtual humans, investigating level-ofdetail AI techniques for large virtual environments, and for exploring joint behaviour and role-passing technique (Sec. V). Additionally, th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: iScience
سال: 2021
ISSN: ['2589-0042']
DOI: https://doi.org/10.1016/j.isci.2021.102679